108 research outputs found
Stationary Mixing Bandits
We study the bandit problem where arms are associated with stationary
phi-mixing processes and where rewards are therefore dependent: the question
that arises from this setting is that of recovering some independence by
ignoring the value of some rewards. As we shall see, the bandit problem we
tackle requires us to address the exploration/exploitation/independence
trade-off. To do so, we provide a UCB strategy together with a general regret
analysis for the case where the size of the independence blocks (the ignored
rewards) is fixed and we go a step beyond by providing an algorithm that is
able to compute the size of the independence blocks from the data. Finally, we
give an analysis of our bandit problem in the restless case, i.e., in the
situation where the time counters for all mixing processes simultaneously
evolve
From Cutting Planes Algorithms to Compression Schemes and Active Learning
Cutting-plane methods are well-studied localization(and optimization)
algorithms. We show that they provide a natural framework to perform
machinelearning ---and not just to solve optimization problems posed by
machinelearning--- in addition to their intended optimization use. In
particular, theyallow one to learn sparse classifiers and provide good
compression schemes.Moreover, we show that very little effort is required to
turn them intoeffective active learning methods. This last property provides a
generic way todesign a whole family of active learning algorithms from existing
passivemethods. We present numerical simulations testifying of the relevance
ofcutting-plane methods for passive and active learning tasks.Comment: IJCNN 2015, Jul 2015, Killarney, Ireland. 2015,
\<http://www.ijcnn.org/\&g
Confusion Matrix Stability Bounds for Multiclass Classification
In this paper, we provide new theoretical results on the generalization
properties of learning algorithms for multiclass classification problems. The
originality of our work is that we propose to use the confusion matrix of a
classifier as a measure of its quality; our contribution is in the line of work
which attempts to set up and study the statistical properties of new evaluation
measures such as, e.g. ROC curves. In the confusion-based learning framework we
propose, we claim that a targetted objective is to minimize the size of the
confusion matrix C, measured through its operator norm ||C||. We derive
generalization bounds on the (size of the) confusion matrix in an extended
framework of uniform stability, adapted to the case of matrix valued loss.
Pivotal to our study is a very recent matrix concentration inequality that
generalizes McDiarmid's inequality. As an illustration of the relevance of our
theoretical results, we show how two SVM learning procedures can be proved to
be confusion-friendly. To the best of our knowledge, the present paper is the
first that focuses on the confusion matrix from a theoretical point of view
Decoy Bandits Dueling on a Poset
We adress the problem of dueling bandits defined on partially ordered sets,
or posets. In this setting, arms may not be comparable, and there may be
several (incomparable) optimal arms. We propose an algorithm, UnchainedBandits,
that efficiently finds the set of optimal arms of any poset even when pairs of
comparable arms cannot be distinguished from pairs of incomparable arms, with a
set of minimal assumptions. This algorithm relies on the concept of decoys,
which stems from social psychology. For the easier case where the
incomparability information may be accessible, we propose a second algorithm,
SlicingBandits, which takes advantage of this information and achieves a very
significant gain of performance compared to UnchainedBandits. We provide
theoretical guarantees and experimental evaluation for both algorithms
Confusion-Based Online Learning and a Passive-Aggressive Scheme
International audienceThis paper provides the first ---to the best of our knowledge--- analysis of online learning algorithms for multiclass problems when the {\em confusion} matrix is taken as a performance measure. The work builds upon recent and elegant results on noncommutative concentration inequalities, i.e. concentration inequalities that apply to matrices, and, more precisely, to matrix martingales. We do establish generalization bounds for online learning algorithms and show how the theoretical study motivates the proposition of a new confusion-friendly learning procedure. This learning algorithm, called \copa (for COnfusion Passive-Aggressive) is a passive-aggressive learning algorithm; it is shown that the update equations for \copa can be computed analytically and, henceforth, there is no need to recourse to any optimization package to implement it
Unconfused Ultraconservative Multiclass Algorithms
We tackle the problem of learning linear classifiers from noisy datasets in a
multiclass setting. The two-class version of this problem was studied a few
years ago by, e.g. Bylander (1994) and Blum et al. (1996): in these
contributions, the proposed approaches to fight the noise revolve around a
Perceptron learning scheme fed with peculiar examples computed through a
weighted average of points from the noisy training set. We propose to build
upon these approaches and we introduce a new algorithm called UMA (for
Unconfused Multiclass additive Algorithm) which may be seen as a generalization
to the multiclass setting of the previous approaches. In order to characterize
the noise we use the confusion matrix as a multiclass extension of the
classification noise studied in the aforementioned literature. Theoretically
well-founded, UMA furthermore displays very good empirical noise robustness, as
evidenced by numerical simulations conducted on both synthetic and real data.
Keywords: Multiclass classification, Perceptron, Noisy labels, Confusion MatrixComment: ACML, Australia (2013
Chromatic PAC-Bayes Bounds for Non-IID Data: Applications to Ranking and Stationary -Mixing Processes
Pac-Bayes bounds are among the most accurate generalization bounds for
classifiers learned from independently and identically distributed (IID) data,
and it is particularly so for margin classifiers: there have been recent
contributions showing how practical these bounds can be either to perform model
selection (Ambroladze et al., 2007) or even to directly guide the learning of
linear classifiers (Germain et al., 2009). However, there are many practical
situations where the training data show some dependencies and where the
traditional IID assumption does not hold. Stating generalization bounds for
such frameworks is therefore of the utmost interest, both from theoretical and
practical standpoints. In this work, we propose the first - to the best of our
knowledge - Pac-Bayes generalization bounds for classifiers trained on data
exhibiting interdependencies. The approach undertaken to establish our results
is based on the decomposition of a so-called dependency graph that encodes the
dependencies within the data, in sets of independent data, thanks to graph
fractional covers. Our bounds are very general, since being able to find an
upper bound on the fractional chromatic number of the dependency graph is
sufficient to get new Pac-Bayes bounds for specific settings. We show how our
results can be used to derive bounds for ranking statistics (such as Auc) and
classifiers trained on data distributed according to a stationary {\ss}-mixing
process. In the way, we show how our approach seemlessly allows us to deal with
U-processes. As a side note, we also provide a Pac-Bayes generalization bound
for classifiers learned on data from stationary -mixing distributions.Comment: Long version of the AISTATS 09 paper:
http://jmlr.csail.mit.edu/proceedings/papers/v5/ralaivola09a/ralaivola09a.pd
Multiple Subject Learning for Inter-Subject Prediction
International audienceMulti-voxel pattern analysis has become an important tool for neuroimaging data analysis by allowing to predict a behavioral variable from the imaging patterns. However, standard models do not take into account the differences that can exist between subjects, so that they perform poorly in the inter-subject prediction task. We here introduce a model called Multiple Subject Learning (MSL) that is designed to effectively combine the information provided by fMRI data from several subjects; in a first stage, a weighting of single-subject kernels is learnt using multiple kernel learning to produce a classifier; then, a data shuffling procedure allows to build ensembles of such classifiers, which are then combined by a majority vote. We show that MSL outperforms other models in the inter-subject prediction task and we discuss the empirical behavior of this new model
MKPM: A multiclass extension to the kernel projection machine
International audienceWe introduce Multiclass Kernel Projection Machines (MKPM), a new formalism that extends the Kernel Projection Machine framework to the multiclass case. Our formulation is based on the use of output codes and it implements a co-regularization scheme by simultaneously constraining the projection dimensions associated with the individual predictors that constitute the global classifier. In order to solve the optimization problem posed by our formulation, we propose an efficient dynamic programming approach. Numerical simulations conducted on a few pattern recognition problems illustrate the soundness of our approach
Differentially Private Sliced Wasserstein Distance
Developing machine learning methods that are privacy preserving is today a
central topic of research, with huge practical impacts. Among the numerous ways
to address privacy-preserving learning, we here take the perspective of
computing the divergences between distributions under the Differential Privacy
(DP) framework -- being able to compute divergences between distributions is
pivotal for many machine learning problems, such as learning generative models
or domain adaptation problems. Instead of resorting to the popular
gradient-based sanitization method for DP, we tackle the problem at its roots
by focusing on the Sliced Wasserstein Distance and seamlessly making it
differentially private. Our main contribution is as follows: we analyze the
property of adding a Gaussian perturbation to the intrinsic randomized
mechanism of the Sliced Wasserstein Distance, and we establish the
sensitivityof the resulting differentially private mechanism. One of our
important findings is that this DP mechanism transforms the Sliced Wasserstein
distance into another distance, that we call the Smoothed Sliced Wasserstein
Distance. This new differentially private distribution distance can be plugged
into generative models and domain adaptation algorithms in a transparent way,
and we empirically show that it yields highly competitive performance compared
with gradient-based DP approaches from the literature, with almost no loss in
accuracy for the domain adaptation problems that we consider
- …